1
00:00:00,790 --> 00:00:07,320
[Music]

2
00:00:14,189 --> 00:00:09,199
[Applause]

3
00:00:15,499 --> 00:00:14,199
okay thank you so I'm as as I was

4
00:00:18,450 --> 00:00:15,509
introduced here to talk about the

5
00:00:21,839 --> 00:00:18,460
universal genetic code and the way I

6
00:00:23,700 --> 00:00:21,849
normally introduce this project to pure

7
00:00:25,589 --> 00:00:23,710
biochemists as I first have to convince

8
00:00:28,019 --> 00:00:25,599
them that this is something that might

9
00:00:29,730 --> 00:00:28,029
have evolved and did not magically

10
00:00:33,150 --> 00:00:29,740
appear overnight but I think I don't

11
00:00:36,180 --> 00:00:33,160
need to convince this room of that the

12
00:00:38,670 --> 00:00:36,190
very earliest life likely used the

13
00:00:41,789 --> 00:00:38,680
genetic code that was more simple than

14
00:00:43,859 --> 00:00:41,799
the universal code we know today likely

15
00:00:46,770 --> 00:00:43,869
comprising of just a handful of amino

16
00:00:49,380 --> 00:00:46,780
acids and it complexified over time

17
00:00:51,450 --> 00:00:49,390
adding more and more amino acids as it

18
00:00:54,390 --> 00:00:51,460
went there are a bunch of different

19
00:00:57,000 --> 00:00:54,400
theories for the order in which these

20
00:00:59,960 --> 00:00:57,010
amino acids were added these theories

21
00:01:03,899 --> 00:00:59,970
include the number of codons they have

22
00:01:06,450 --> 00:01:03,909
the GC richness of their codons the

23
00:01:07,950 --> 00:01:06,460
chemical simplicity of the amino acids

24
00:01:10,320 --> 00:01:07,960
themselves and whether they've been

25
00:01:13,590 --> 00:01:10,330
discovered on meteorites or synthesized

26
00:01:15,300 --> 00:01:13,600
in the lab finally we can look for clues

27
00:01:18,170 --> 00:01:15,310
and metabolisms so the order in which

28
00:01:21,840 --> 00:01:18,180
they can be synthesized in biology so

29
00:01:24,060 --> 00:01:21,850
there's an excess of 60 of these

30
00:01:27,690 --> 00:01:24,070
different hypotheses for the order of

31
00:01:30,510 --> 00:01:27,700
the codes evolution and for my work

32
00:01:34,020 --> 00:01:30,520
today we're utilizing a meta analysis

33
00:01:36,270 --> 00:01:34,030
performed by Trevor North in 2004 and he

34
00:01:39,120 --> 00:01:36,280
took all of these 16 amino acids and

35
00:01:41,880 --> 00:01:39,130
since synthesized them into a consensus

36
00:01:44,180 --> 00:01:41,890
order and so this year represents a

37
00:01:48,630 --> 00:01:44,190
single hypothesis for the codes

38
00:01:50,700 --> 00:01:48,640
evolution and today I'm going to be

39
00:01:52,770 --> 00:01:50,710
interrogating that and at the same time

40
00:01:56,760 --> 00:01:52,780
interrogating a more broad hypothesis

41
00:01:58,830 --> 00:01:56,770
can early genetic codes produce proteins

42
00:02:04,219 --> 00:01:58,840
that can support functions that are

43
00:02:06,980 --> 00:02:04,229
essential for life and to do that I took

44
00:02:10,190 --> 00:02:06,990
theoretical snapshots through this hyper

45
00:02:12,990 --> 00:02:10,200
hypothesized consensus order and

46
00:02:15,920 --> 00:02:13,000
generated libraries of randomized

47
00:02:18,780 --> 00:02:15,930
proteins that represent these different

48
00:02:21,690 --> 00:02:18,790
code hypothetical codes throughout the

49
00:02:24,360 --> 00:02:21,700
evolution so my most simple

50
00:02:27,660 --> 00:02:24,370
code has only the five most ancient

51
00:02:31,350 --> 00:02:27,670
amino acids and then the nine and then

52
00:02:34,339 --> 00:02:31,360
the sixteen lakhs just those final four

53
00:02:37,440 --> 00:02:34,349
amino acids and then we have a a

54
00:02:40,680 --> 00:02:37,450
positive control code which is all of

55
00:02:42,839 --> 00:02:40,690
today's amino acids so the protein

56
00:02:45,960 --> 00:02:42,849
variants in each library have a

57
00:02:49,380 --> 00:02:45,970
randomized region of 80 amino acids and

58
00:02:52,979 --> 00:02:49,390
each library has a complexity of greater

59
00:02:55,890 --> 00:02:52,989
than 10 to the 12 unique sequences so

60
00:02:58,979 --> 00:02:55,900
the the power of our approach is that we

61
00:03:03,630 --> 00:02:58,989
planned to compare these four libraries

62
00:03:06,990 --> 00:03:03,640
these four alphabets of this evolving

63
00:03:10,800 --> 00:03:07,000
code and see what propensity they have

64
00:03:12,780 --> 00:03:10,810
for structure and function so the first

65
00:03:15,930 --> 00:03:12,790
thing we did was interrogate for

66
00:03:18,720 --> 00:03:15,940
structure or a proxy proxy for structure

67
00:03:21,449 --> 00:03:18,730
which is the ability to form soluble

68
00:03:23,940 --> 00:03:21,459
protein when expressed in e.coli so I

69
00:03:27,000 --> 00:03:23,950
took about two dozen individual variants

70
00:03:31,830 --> 00:03:27,010
from each library and expressed them and

71
00:03:34,890 --> 00:03:31,840
then used simple soluble versus

72
00:03:39,750 --> 00:03:34,900
insoluble as an idea of whether they can

73
00:03:42,930 --> 00:03:39,760
fold in the cell and a rough conclusion

74
00:03:46,250 --> 00:03:42,940
is that the newer alphabets the 16 and

75
00:03:49,530 --> 00:03:46,260
the 20 were most likely to be expressed

76
00:03:51,629 --> 00:03:49,540
whereas the older alphabets the 5 and

77
00:03:53,879 --> 00:03:51,639
the 9 when they were expressed were more

78
00:03:57,990 --> 00:03:53,889
likely to be soluble which was

79
00:04:00,780 --> 00:03:58,000
interesting so now we had our sometimes

80
00:04:04,410 --> 00:04:00,790
soluble proteins the next thing we

81
00:04:06,809 --> 00:04:04,420
wanted to interrogate was which of these

82
00:04:10,050 --> 00:04:06,819
alphabets or do all of these alphabets

83
00:04:13,259 --> 00:04:10,060
have a propensity for functions that are

84
00:04:15,809 --> 00:04:13,269
essential for life so one of the most

85
00:04:17,190 --> 00:04:15,819
simple functions I can select for

86
00:04:20,490 --> 00:04:17,200
because I kind of wanted to go a little

87
00:04:24,659 --> 00:04:20,500
bit easy on these alphabets is ligand

88
00:04:27,450 --> 00:04:24,669
binding and some essential ligands at

89
00:04:30,870 --> 00:04:27,460
the origin of life as Mark has already

90
00:04:32,480 --> 00:04:30,880
mentioned ATP and gtp so these are

91
00:04:34,730 --> 00:04:32,490
essential in a

92
00:04:37,309 --> 00:04:34,740
currencies for all of life and also

93
00:04:40,279 --> 00:04:37,319
components of RNA which I think we can

94
00:04:44,629 --> 00:04:40,289
agree is somewhat central to origin of

95
00:04:47,839 --> 00:04:44,639
life so the method that I use to

96
00:04:49,850 --> 00:04:47,849
interrogate these libraries for binding

97
00:04:52,189 --> 00:04:49,860
of these cofactors is an in vitro

98
00:04:55,550 --> 00:04:52,199
evolution method again similar to Marc

99
00:04:57,559 --> 00:04:55,560
called mRNA display for those of you who

100
00:05:00,740 --> 00:04:57,569
aren't familiar with it it's a technique

101
00:05:04,850 --> 00:05:00,750
that in which you've physically attached

102
00:05:07,159 --> 00:05:04,860
a protein variant to its own encoding

103
00:05:09,589 --> 00:05:07,169
mRNA and this is really powerful because

104
00:05:12,680 --> 00:05:09,599
it means that you can take a library of

105
00:05:15,200 --> 00:05:12,690
trillions of unique sequences and you

106
00:05:17,210 --> 00:05:15,210
can pluck out a single protein that has

107
00:05:20,899 --> 00:05:17,220
your desired function and because it's

108
00:05:22,760 --> 00:05:20,909
attached to its own code you can decode

109
00:05:27,680 --> 00:05:22,770
it and get its sequence and you can also

110
00:05:28,730 --> 00:05:27,690
propagate it through PCR so aside from

111
00:05:31,180 --> 00:05:28,740
that it's pretty similar to the

112
00:05:35,149 --> 00:05:31,190
technique that Mark spoke about so we

113
00:05:39,290 --> 00:05:35,159
generate our RNA protein fusions we

114
00:05:43,339 --> 00:05:39,300
introduce them to some immobilized ATP

115
00:05:45,950 --> 00:05:43,349
and gtp ligands and then only the

116
00:05:50,120 --> 00:05:45,960
fusions that can be competitively eluted

117
00:05:53,270 --> 00:05:50,130
go on to form the precursor for the next

118
00:05:56,749 --> 00:05:53,280
cycle and you assume that once you have

119
00:06:02,779 --> 00:05:56,759
binding you see you observe increased

120
00:06:05,510 --> 00:06:02,789
enrichment after every round so I first

121
00:06:08,959 --> 00:06:05,520
performed this with our control alphabet

122
00:06:11,450 --> 00:06:08,969
of the extant 20 amino acids and you can

123
00:06:13,999 --> 00:06:11,460
see along the bottom here the the number

124
00:06:17,270 --> 00:06:14,009
of rounds and then the percent of

125
00:06:19,610 --> 00:06:17,280
fusions that are selected per round up

126
00:06:21,680 --> 00:06:19,620
on the y axis and this was really

127
00:06:24,409 --> 00:06:21,690
encouraging to see because after every

128
00:06:27,350 --> 00:06:24,419
round we saw increased enrichment for

129
00:06:29,930 --> 00:06:27,360
ATP and gtp binding you can see that it

130
00:06:33,080 --> 00:06:29,940
is significantly above a negative

131
00:06:35,870 --> 00:06:33,090
control that I did but this was my

132
00:06:37,870 --> 00:06:35,880
positive control and I didn't know how

133
00:06:41,810 --> 00:06:37,880
the other alphabets were going to behave

134
00:06:44,040 --> 00:06:41,820
five amino acids awfully few to expect

135
00:06:45,629 --> 00:06:44,050
much out of so I

136
00:06:48,540 --> 00:06:45,639
I repeated this experiment with

137
00:06:51,779 --> 00:06:48,550
identical conditions for the three other

138
00:06:55,230 --> 00:06:51,789
ancient alphabets and was really

139
00:06:58,770 --> 00:06:55,240
surprised to see that all three of the

140
00:07:02,969 --> 00:06:58,780
reduced amino acid alphabets yielded ATP

141
00:07:06,659 --> 00:07:02,979
and GTP binders after an average of five

142
00:07:08,189 --> 00:07:06,669
rounds so this was it may not have

143
00:07:10,830 --> 00:07:08,199
enriched quite as high for the five

144
00:07:13,680 --> 00:07:10,840
library but that's definite enrichment

145
00:07:16,050 --> 00:07:13,690
and this was a repeatable result so

146
00:07:19,589 --> 00:07:16,060
really excited about this so the next

147
00:07:22,920 --> 00:07:19,599
stage was to find out what these

148
00:07:25,020 --> 00:07:22,930
populations of sequences are so I picked

149
00:07:28,980 --> 00:07:25,030
around from each of the experiments for

150
00:07:30,809 --> 00:07:28,990
deep sequencing to glean a little more

151
00:07:33,209 --> 00:07:30,819
information from this experiment I

152
00:07:36,149 --> 00:07:33,219
repeated all of these rounds and then

153
00:07:38,279 --> 00:07:36,159
after I generated the fusions I split

154
00:07:41,909 --> 00:07:38,289
them into four different conditions so

155
00:07:45,600 --> 00:07:41,919
the first has ATP and gtp pulled

156
00:07:49,499 --> 00:07:45,610
together as my typical selection round

157
00:07:52,260 --> 00:07:49,509
was and then I did ATP by itself gtp by

158
00:07:56,519 --> 00:07:52,270
itself and then ain't no ligand control

159
00:07:59,999 --> 00:07:56,529
as my negative control I had the kind

160
00:08:03,450 --> 00:08:00,009
help of celia blanco and irene chen who

161
00:08:06,570 --> 00:08:03,460
i think here in analyzing this data and

162
00:08:09,450 --> 00:08:06,580
the metric we came up as a proxy for

163
00:08:12,360 --> 00:08:09,460
binding affinity was relative enrichment

164
00:08:15,420 --> 00:08:12,370
so we considered an individual sequence

165
00:08:17,790 --> 00:08:15,430
to be enriched if it had more copies

166
00:08:20,010 --> 00:08:17,800
after selection than it had prior to

167
00:08:22,800 --> 00:08:20,020
selection and we considered it to show

168
00:08:25,019 --> 00:08:22,810
relative enrichment if this rep

169
00:08:27,930 --> 00:08:25,029
enrichment was greater than this

170
00:08:30,420 --> 00:08:27,940
enrichment for the negative control so

171
00:08:32,370 --> 00:08:30,430
we would expect the values for relative

172
00:08:38,459 --> 00:08:32,380
enrichment to be greater than one to

173
00:08:40,709 --> 00:08:38,469
show real specific binding so this is an

174
00:08:42,130 --> 00:08:40,719
awful lot of information but I can walk

175
00:08:44,830 --> 00:08:42,140
you through it

176
00:08:49,000 --> 00:08:44,840
each of these circles represents a

177
00:08:52,090 --> 00:08:49,010
unique sequence on the y-axis we have

178
00:08:54,460 --> 00:08:52,100
relative enrichment on a log scale along

179
00:08:56,470 --> 00:08:54,470
the bottom we have the three different

180
00:09:00,520 --> 00:08:56,480
selection conditions per library

181
00:09:03,640 --> 00:09:00,530
color-coded up here the line here is the

182
00:09:05,110 --> 00:09:03,650
threshold for relative enrichments

183
00:09:07,720 --> 00:09:05,120
everything above this is showing

184
00:09:10,990 --> 00:09:07,730
specific binding everything below we

185
00:09:13,450 --> 00:09:11,000
don't want to think about the median

186
00:09:16,450 --> 00:09:13,460
value is shown in black and straight

187
00:09:19,480 --> 00:09:16,460
away if we look at it we can see this

188
00:09:23,470 --> 00:09:19,490
trend of increased propensity for

189
00:09:26,800 --> 00:09:23,480
binding and increased affinity of

190
00:09:30,160 --> 00:09:26,810
binding with the increased complexity of

191
00:09:33,550 --> 00:09:30,170
the alphabet but there's a plateau after

192
00:09:36,580 --> 00:09:33,560
you reach the 16 amino acid alphabet

193
00:09:40,150 --> 00:09:36,590
which suggests that those final four

194
00:09:41,890 --> 00:09:40,160
amino acids are adding a lot for this

195
00:09:45,810 --> 00:09:41,900
particular function in this particular

196
00:09:49,570 --> 00:09:45,820
selection if we look at the five library

197
00:09:52,450 --> 00:09:49,580
those medians are really low there's

198
00:09:54,550 --> 00:09:52,460
still some binding and the the best

199
00:09:56,440 --> 00:09:54,560
binders yeah they're showing some

200
00:10:00,250 --> 00:09:56,450
relative enrichment but if we look at

201
00:10:02,620 --> 00:10:00,260
the sixteen for ATP it is several orders

202
00:10:06,670 --> 00:10:02,630
of magnitude better than the five and

203
00:10:10,090 --> 00:10:06,680
that was really exciting to see I'm

204
00:10:13,600 --> 00:10:10,100
gonna zoom in now on the the best six

205
00:10:19,060 --> 00:10:13,610
binders from each library from each

206
00:10:21,010 --> 00:10:19,070
alphabet and again this this really

207
00:10:23,650 --> 00:10:21,020
pronounces the the difference and

208
00:10:25,780 --> 00:10:23,660
relative enrichment between the

209
00:10:28,060 --> 00:10:25,790
alphabets so you can see that you have a

210
00:10:30,700 --> 00:10:28,070
relative enrichment between one and ten

211
00:10:35,860 --> 00:10:30,710
for the five and all the way up to over

212
00:10:38,890 --> 00:10:35,870
a thousand for the sixteen but this is

213
00:10:44,530 --> 00:10:38,900
480p and this pattern isn't as evident

214
00:10:46,080 --> 00:10:44,540
for gtp-binding so the the final thing I

215
00:10:48,540 --> 00:10:46,090
wanted to do

216
00:10:50,790 --> 00:10:48,550
I wanted to talk about is to talk about

217
00:10:53,190 --> 00:10:50,800
the specificity of binding as I

218
00:10:56,730 --> 00:10:53,200
mentioned I performed the selection with

219
00:10:59,630 --> 00:10:56,740
a mixture of ligands mostly because I

220
00:11:01,890 --> 00:10:59,640
couldn't be bothered doing it twice and

221
00:11:04,260 --> 00:11:01,900
but it's quite striking if you look

222
00:11:07,410 --> 00:11:04,270
again at just these top six selected

223
00:11:09,570 --> 00:11:07,420
binders alphabet you can see that for

224
00:11:11,910 --> 00:11:09,580
the five and the nine there isn't really

225
00:11:14,100 --> 00:11:11,920
much discrimination between bindings or

226
00:11:17,210 --> 00:11:14,110
the ratio of ATP and gtp is on the y

227
00:11:20,550 --> 00:11:17,220
axis here so there's some really sloppy

228
00:11:22,920 --> 00:11:20,560
nonspecific binding happening whereas if

229
00:11:25,380 --> 00:11:22,930
you look at the sixteen or the twenty

230
00:11:28,230 --> 00:11:25,390
there's much more of a bias one way or

231
00:11:31,190 --> 00:11:28,240
the other towards ATP or GTP for the

232
00:11:35,190 --> 00:11:31,200
twenty and this suggests that as the the

233
00:11:37,380 --> 00:11:35,200
chemistry's of the genetic code get more

234
00:11:39,330 --> 00:11:37,390
complex they are more able to

235
00:11:41,280 --> 00:11:39,340
distinguish between the ligands that

236
00:11:43,260 --> 00:11:41,290
they're binding which is of course a

237
00:11:48,150 --> 00:11:43,270
function that is crucial for modern

238
00:11:51,060 --> 00:11:48,160
biology or any biology so I would like

239
00:11:54,120 --> 00:11:51,070
to go to some really quick take-home

240
00:11:56,850 --> 00:11:54,130
points we managed to find binders from

241
00:11:58,350 --> 00:11:56,860
the five amino acid alphabet and that is

242
00:12:01,680 --> 00:11:58,360
something we never expect to see and

243
00:12:03,660 --> 00:12:01,690
we're really excited about that we

244
00:12:05,850 --> 00:12:03,670
observed that the relative enrichments

245
00:12:08,520 --> 00:12:05,860
the proxy for binding increased with

246
00:12:11,220 --> 00:12:08,530
library complexity the best binders were

247
00:12:13,560 --> 00:12:11,230
from the sixteen library not from the

248
00:12:16,650 --> 00:12:13,570
twenty and the most selective binders

249
00:12:18,990 --> 00:12:16,660
were from the sixteen and the twenty so

250
00:12:21,660 --> 00:12:19,000
I think I can tentatively and make the

251
00:12:25,200 --> 00:12:21,670
cheekily conclude that under some very

252
00:12:27,540 --> 00:12:25,210
specific conditions hypothetical early

253
00:12:30,960 --> 00:12:27,550
genetic codes may actually rival today's

254
00:12:32,310 --> 00:12:30,970
Universal code with that I'd like to

255
00:12:38,870 --> 00:12:32,320
acknowledge people who've worked on this

256
00:12:45,600 --> 00:12:41,880
so we have time for a few questions yes

257
00:12:45,610 --> 00:13:25,320
[Music]

258
00:13:34,590 --> 00:13:27,540
that you've identified a stone in my

259
00:13:38,220 --> 00:13:34,600
shoe yes the the twin the 9 did take

260
00:13:41,280 --> 00:13:38,230
longer and I repeated it and repeated it

261
00:13:44,520 --> 00:13:41,290
and it still would consistently take a

262
00:13:48,780 --> 00:13:44,530
little longer I don't have a good answer

263
00:13:51,660 --> 00:13:48,790
because when we did get binders they are

264
00:13:56,040 --> 00:13:51,670
identify early I identify ibly better

265
00:13:58,710 --> 00:13:56,050
than the 5 one caveat one technical

266
00:14:00,210 --> 00:13:58,720
caveat is that the complexity of the 9

267
00:14:02,730 --> 00:14:00,220
library at the beginning of the

268
00:14:05,670 --> 00:14:02,740
experiment was higher than the other

269
00:14:08,490 --> 00:14:05,680
three libraries so that is something

270
00:14:10,650 --> 00:14:08,500
that we need to correct for when we

271
00:14:13,080 --> 00:14:10,660
publish this and it might be that that

272
00:14:14,970 --> 00:14:13,090
actually explains for that extra lag

273
00:14:17,400 --> 00:14:14,980
before we saw enrichment I just need it

274
00:14:20,250 --> 00:14:17,410
had more sequences to pair through

275
00:14:31,210 --> 00:14:20,260
before it went right up we have time for

276
00:14:39,100 --> 00:14:34,759
these experiments when they finally did

277
00:14:45,079 --> 00:14:39,110
some analysis it turned out that the

278
00:14:47,720 --> 00:14:45,089
interactions um we're in the process of